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Abstract 

A method is proposed for distinguishing highly boosted hadronically decaying Ws (W-jets) 
from QCD-jets using jet substructure. Previous methods, such as the filtering/mass-drop method, 
can give a factor of ~ 2 improvement in S/y/B for jet pt ^ 200 GeV. In contrast, a multivariate 
approach including new discriminants such as -R-cores, which characterize the shape of the W-jet, 
subjet planar flow, and grooming-sensitivities is shown to provide a much larger factor of ~ 5 
improvement in S/\/~B. For longitudinally polarized W's, such as those coming from many new 
physics models, the discrimination is even better. Comparing different Monte Carlo simulations, we 
observe a sensitivity of some variables to the underlying event; however, even with a conservative 
estimates, the multivariate approach is very powerful. Applications to semileptonic WW resonance 
searches and all-hadronic W+jet searches at the LHC are also discussed. Code implementing our 



W-jet tagging algorithm is publicly available at http://jets.physics.harvard.edu/wtag 



1. INTRODUCTION 



Highly energetic W and Z bosons appear in many interesting physics processes at the 
TeV scale to be explored at the Large Hadron Collider (LHC). For example, WW scattering 
at high energy is a direct probe of the electroweak breaking mechanism 1,0. Heavy 
resonances, such W' , a heavy Higgs or fourth generation quarks, often decay 

to electroweak gauge bosons. Since the energy scales of these processes are much higher 
than the electroweak scale, the W and Z bosons are often highly boosted. When decaying 
hadronically, a highly boosted W or Z boson then appears as a single jet, called a W-jet 
or Z-jet. Since high energy QCD-jets (jets initiated by a quark or gluon) will be copiously 
produced at the LHC, W or Z-jets may be overwhelmed by the QCD background, making 
it difficult to explore the nature of TeV scale physics. Therefore, being able to distinguish 
efficiently W and Z-jet s from QCD-jets could significantly improve our ability to understand 
the nature of TeV scale physics. 

A number of recent studies have explored the hadronic decays of boosted objects, in- 
cludin g no t only TV's and Z's 0-7] but also boosted light Higgses js- 17] and top quarks 



15, 



18 



241 ] . These studies have led to a general understanding of some of the essential 



differences between a QCD-jet and a jet initiated from a boosted massive particle decay. For 
example, a massive particle decay often contains more than one hard subjet, i.e. regions 
within the jet where energy is concentrated. On the contrary, the energy distribution of a 
QCD-jet is more often dominated by one and only one such region. Due to collinear singu- 
larities, QCD-jets tend to comprise particles with hierarchical energies, while the energies of 
particles in a massive particle jet are usually more balanced. These ideas were used in one of 
the first jet-substructure studies, Ref. Q, which attempted to identify W-jets in WW scat- 
tering. Some of the most poignant applications of substructure techniques include reviving 



the Ught Hi ggS to 66 search [8], whieh has been implemented by ATLAS and reducing 
the back grou nds to boosted hadronic tops by a factor of 10,000 [19] which was implemented 
by CMS Q]. 

Boosted jets are often highly collimated, with characteristic sizes of order R = 0.4 or 
smaller. The basic trick to using jet substructure is, rather than starting with R = 0.4 jets, 
one starts with much larger jets, say R = 1.2, and then parses the jet using its clustering 
history. The goal is to keep decay products from the boosted object, throwing out contam- 
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ination from initial state radiation and the underlying event. Some general algorithms for 

n n fi 

doing this include filtering [8j, trimming [27|, and pruning [4J. While these grooming tech- 
niques seem to help, it is not clear they are in any way optimal. It was shown in llj that the 
different methods extract overlapping but also at least partially complimentary information. 
In 281 ]. it was shown that even one algorithm, trimming, is at least partially complimentary 



to itself if different sets of parameters are used. Moreover, an interesting but underappre- 
ciated point about grooming that we demonstrate here (see Figure [1]) is that grooming, by 
itself, does not produce significance improvements much better than simply using narrower 
jets. For example, while filtering with a mass-drop criteria can produce up to a factor of 2.3 
improvement in S/ y/~B in a px ~ 500 GeV boosted-iy sample, simply using a narrow jet 
size (R = 0.4) can itself already do nearly as well, with a S/\f~B improvement of order 2. 

It is the goal of this paper to explore the optimization of boosted ^-tagging by using 
much more of the jets' substructure than what comes out of grooming. For example, the de- 
cay products of a highly boosted W are confined to a small region around the W momentum, 
while the radiation of a QCD-jet with the same px is much more scattered. This effect is not 
taken into account if we only consider the leading subjets after jet grooming. To optimize 
the discriminating power, we attempt a comprehensive examination of the properties of a de- 
caying color singlet particle and its QCD-jet background. We define a set of variables which 
characterize jet radiation patterns. These include what we call mass- and pt -R-cores, which 
measure how the mass and px of a jet change when it is reclustered with different i?'s. We 
also consider variables describing jet shapes including planar flow js, 15] and pull 34]. In ad- 
dition, we do use the jet grooming algorithms to extract some useful information, such as the 
masses and prs of the groomed jets, the number of subjets, and the subjet p^s and masses. 

To quantify and compare variables, we use the Significance Improvement Characteristic 
(SIC) [28(, defined as the ratio of the signal efficiency to the square root of the background 



efficiency, Es/y/eB- As discussed in [28] SIC curves facilitate a visual comparison of various 
potential discriminants. We find that filtering gives a SIC around 2.0. Starting from the 
samples after filtering, the additional shape and substructure variables each add at most 
an additional 20% when individually used. However, we find that when the variables are 
combined in a multivariate analysis (MVA) using Boosted Decision Trees (BDT), the sig- 
nificance improvement can be as high as 3.4 ~ 6.7, for jets with px from 200 ~ 1000 GeV. 
In other words, for a signal efficiency of 40%, we can reject around 4 times as much of 
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FIG. 1: Significance Improvement Characteristics (es/ ' y/ss) for leptonic-VF+VF-jet events (signal) 
versus their leptonic-VF+QCD-jet background, for p^l G (500, 550) GeV. The bottom two curves 
show the effect of an optimized simple mass window for R = 1.2 and R = 0.4 Cambridge/ Aachen 
jets. The falloff of the R = 0.4 efficiencies is due to events in which the VF-subjets are well 
separated. The next curve up shows the efficiency of the filtering-with-mass-drop method of g], 
optimized over the filtering parameters. The top curve is the result of our multivariate analysis, 
including many variables on top of the filtered result. The starting point for the multivariate 
analysis is a filtered sample with a window slightly wider than what is optimal for filtering, as 
indicated by the star. 



the background as filtering alone. This allows for substantial improvement in the reach for 
diboson resonances, as well as the possibility of seeing the hadronic PF-decay mode in the 
VF+jets sample. Figure [1] shows a summary of our method's efficiency. 

This article is organized as follows. In Section [21 the sample we use to optimize IV-jet 
tagging is described. Section[3]reviews the jet-grooming algorithms and describes to what ex- 
tent they are useful for PF-jet tagging. Section 0] describes the jet-substructure and jet-shape 
variables we use on top of grooming. In Section [5], we describe how to combine the variables 
in a multivariate analysis to optimize IF-jet tagging. In Section [6j we discuss the difference 
in performance for different W polarizations, which has implications for applications to new 
physics searches. Then in Section [7| we explore the robustness of our method using different 
Monte Carlo tools. Section [S] contains applications to two interesting processes: Z' boson 
discovery and VF-jet identification in dijet events. We conclude in Section 



3 



2. EVENT SAMPLES 



Although we are more interested in boosted W's from new physics, we use the standard 
model (SM) processes, WW and VF+jet to illustrate our method. As we will show, the 
properties of the VK-jet and therefore the distinguishing power is fairly insensitive to the 
particular process. The results (cuts, parameters, etc.) of our analysis can be applied 
directly to processes with boosted W-jets. It is also straightforward to apply the same 
procedure for other boosted hadronically-decaying particles, such as a Z or Higgs, although 
the optimal cuts will differ. For simplicity, we stick to W's in this work. 

For the optimization procedure we take as the signal process WW production in the 
standard model, with one of the W's decaying hadronically and the other one leptonically. 
The background is W^+jet production with the W decaying leptonically. At large px, each 
signal event contains a W-jet while each background event contains a high px QCD-jet. 
We simulate the hard WW process in pp collisions at 14 TeV center of mass energy with 



both Ws decayed using Madgraph/Madevent V4.4.32 [29|, which includes the full 2 — > 4 



matrix elements. Thus, spin correlations and polarization effects are included. The Mad- 



graph events are then fed into Pythia V8.142 30j, where showering, hadronization and the 



underlying event are added. The VK+jet events are generated with Pythia 8 alone. 

In order to simulate the detector response, we divide the (77, 0) plane to 0.1 x 0.1 calorime- 
ter cells and restrict rj to be within [—5, 5], roughly corresponding to the hadronic calorimeter 
resolution of the LHC detectors. We sum over the energy of particles entering each calorime- 
ter cell and replace it with a massless particle of the same energy, pointing to the center 
of the cell. We have excluded neutrinos and charged leptons from leptonic W decays when 
summing over the energy. 

The calorimeter cells are clustered first with a relatively large radius R = 1.2 using 
Cambridge/ Aachen algorithm as implemented in Fast Jet V2.4.2 |3l| to identify the high px 
jets. Only the leading jet in each event is kept in our analysis. We then separate the sample 
by pt in 50 GeV bins from 200 GeV to 1050 GeV. We have also included a single bin for 
Pt > 1050 GeV, to account for higher p T jets appearing occasionally in the applications 
considered in Section [HJ 1 



1 Due to PDF suppression, this bin is dominated by jets with px just above 1050 GeV and gives similar 
results as the (1000, 1050) GeV bin. Special care is needed to optimize extremely high px W-jets (> 
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To characterize the effectiveness of different methods, we first calculate the signal and 
background efficiencies. Let n l s and n B denote respectively the initial number of signal and 
background jets within a particular p? bin. At the end of our analysis, after various cuts 
we are left with n s signal jets and n B background jets. Then the signal and background 
efficiencies are defined as 

ns n B 

£,S=—, ^B = ~- (1) 

n s n B 

By comparing the efficiences, the conclusions are luminosity-independent. Having a lower 
£b at the same value of Es is the indication of a superior discriminant. To visualize the 
effectiveness of discriminants, we will look at the Significance Improvement Characteristic 

SIC = -^=, (2) 



which is a rough proxy for the improvement in significance. One advantage of using this 
characteristic, as explained in |28j is that it gives a well-defined quantitative measure of 
how good a variable does. For a given analysis, one will often choose cuts on a variable 
or multivariable discriminant away from the optimal SIC. In that case, for any Es, the SIC 
curves let you easily read off the corresponding eb- 

We choose to analyze for each p? bin separately because we eventually want to use our 
method to identify boosted W's from new physics processes, which may have a very different 
Pt distribution from the SM WW. As we will show, the optimal cuts are p^-dependent, and 
we can obtain the best distinguishing power by treating the pr bins separately. 



3. GROOMING: FILTERING, PRUNING AND TRIMMING 

The first step in our optimization procedure is to identify subjets and reduce the number 
of background events using existing jet grooming algorithms. These algorithms include fil- 
tering (we always use the mass drop method together with filtering), pruning and trimming. 
These algorithms are qualitatively similar but differ in details, which we briefly review in 

nnn 

Appendix [A] More details can be found in Refs. [4|, la, |27| . 

Besides the jet size R one uses to cluster the original jets, each of the three jet groom- 
ing algorithms involves two tunable parameters. We will scan the parameters to maximize 

1200 GeV) because all or most of the decay products can enter the same calorimeter cell, making it very 
difficult to extract the mass. This regime is beyond the scope of this article. 
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FIG. 2: Jet masses before and after filtering/mass-drop for pj, 1 £ (500,550) GeV. The numbers of 
events are normalized to be the same for the signal and the background, (a) Before filtering; (b) 
after filtering with /i = 0.71 and y cnt = 0.09. When a mass-drop is not found, we add an entry in 
the zero mass bin such that the total number of jets is unchanged. 



n>s/ 1 y/n-Bi where the numbers of signal and background events after jet grooming are defined 
as follows. After jet grooming, the jet mass is always shifted lower, with signal jets concen- 
trated around the W mass and background jets concentrated around much lower values. See 
Figure [2] for an example. Therefore, we can apply a mass window cut to efficiently reduce 
the number of background events. Then ns and ris are defined as the number of signal and 
background events in the mass window. 

Obviously, the significance also depends on the mass window we choose, so we scan over 
the mass window too. The filtering result presented in Figure [T] is from such scans. For 
example, the optimal mass window for p 3 ^ G (500, 550) GeV is mgi t G (70, 90) GeV with 
filtering parameters /i = 0.71 and y cut = 0.09, where m mt is the jet mass after filtering. 
However, as we will further improve the distinguishing power by conducting a multivariate 
analysis using jet-substructure variables in the following sections, it is desirable to keep 
more events at this stage. Therefore, we choose a relatively large mass window, mm G 
(60, 100) GeV and scan the grooming parameters to maximize ns/y/nB in this window for 
all Pt's. It turns out by doing so we obtain an equal or larger significance improvement 
after the multivariate analysis than what we would have gotten with the window which is 
optimal for filtering alone. 

We have scanned the parameters for all three algorithms and all pt bins. The optimal 
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FIG. 3: The significance improvement characteristic (SIC= es/^/£b) as a function of the filtering 
parameters, \x and y cut , for pj, G (500,550) GeV. 



parameters are given in Table |3] in Appendix |A] In Figure [3l we show the contour plot for 
the significance improvement characteristics as a function of the filtering parameters fi and 
y cut , for p 3 ^ G (500, 550) GeV and m filt G (60, 100) GeV. Note that the contours do not 
close on the right where the significance is insensitive to the fi parameter. This is because 
the 2/ cut , which constrains how "unbalanced" the two subjets can be, effectively yields a 
lower bound on the mass drop ratio, making larger \x parameters ineffective. The filtering 
parameters that maximize the significance for all px bins are shown in Figure H] (a), and the 
corresponding signal and background efficiencies, as well as the SICs are shown in Figure H] 
(b). We see that we typically gain a factor of ~ 2 in significance from filtering using the best 
parameters. This is also true for trimming and pruning. See Appendix |A] for more details. 
It turns out that filtering yields slightly better significance. Therefore, in the following, we 
will apply the mass window cut mgit G (60, 100) GeV on the filtered jet mass, and examine 
further the events passing the cut. 



4. JET SUBSTRUCTURE AND JET SHAPE VARIABLES 

As discussed in the previous section, the first step in our analysis is to require that the 
candidate JV-jet, after filtering, has a mass m filt G (60, 100) GeV. Even after this cut, P^-jets 
and QCD-jets still differ in many aspects. In this section, we define a set of observables which 
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FIG. 4: Tuning of filtering parameters for 
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-jets versus QCD-jets in the standard model. 



help further boost the significance. Some of these variables have been proposed in recent 
works on jet substructure, as will be briefly reviewed. There are also other variables which 
we find very useful yet have not been mentioned or emphasized in existing references. We 
first classify relevant variables according to the physics they represent, then present results 
based on a set of principle variables which gives major significance gain. As mentioned 
before, the discrimination power depends on the jet px, so we always work on data samples 
in separate 50 GeV px bins. 

Keep in mind, the jets studied in this section are the original unfiltered R = 1.2 "fat" 
jets, but we have thrown out jets not passing the filtered mass window. The efficiency for 
the filtering mass cut is indicated by the point marked * in Figure [U 



A. Jet and sub jet mass 

For samples with the same px, a QCD-jet originates from a highly off-shell quark or 
gluon, with no definite mass scale, while a hard jet from resonance decay such as a VF-jet is 
associated with a definite mass scale raw- As a result, a QCD-jet's mass (J7ij e t) is expected 
to be roughly proportional to its px, while the mass of a boosted W is mostly set by raw 
with milder dependence on its px- In the same way, if a jet can be decomposed into two 
hard subjets, for example via filtering, the masses of these subjets (m su b) are roughly set 
by Pt in the case of QCD while by raw in the case of W-jets. In our samples, both the 
QCD-jets and the W-jets have already passed the filtered mass window cut. Nevertheless, 
there is still distinguishing power in both mj Ct and m su ^. For illustration, see Figure [5j It 
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FIG. 5: Distributions for the fat-jet mass and hardest subjet mass for signal (W-jets) and back- 
ground (QCD-jets) with E (500, 550) GeV. The edge at 60 GeV in the jet mass plot follows 
from a preselection cut on the filtered mass, mgit £ (60, 100) GeV. 



is natural to also ask about the relationship between the fat-jet mass and the mass after 
grooming. We call observables describing this relationship grooming sensitivities, and they 
will be described below. 



B. Color connections and i?-cores 

Another difference between a QCD-jet and a VK-jet is that the W-jet originates from a 
color singlet, while the QCD-jet does not. By looking at the leading order matrix element 
of related processes, one can see in QCD (for example qq — >■ g — >■ qq) final state partons 
are color-connected to initial state partons. On the other hand, the two partons from a 
W decay are color-connected to each other. This picture is exact at large Nq, and get s 
0(1/ Nq) ~ 10% corrections in practice. The difference in color-flow was exploited in 341 ] . 
which observed that the subsequent radiation pattern had a characteristic first moment 
vector which was called pull. Projections of the various pull vectors, such as pull-angles 



and pull-size 



were shown to have discrimination power. Recently, pull has been measured 



by DO in Z+jet events with Z — )■ vv [421 ]. 



While pull is a useful, general purpose measure of color flow, there may be better ways to 
capitalize on the color singlet nature of the W boson in the boosted case. Here we propose 
a new set of variables i2-cores inspired by color connection considerations, but which are 
sensitive to aspects of the energy balance in W-jets and QCD-jets as well. For a jet of 
given px to have mass m- ]etl it must have at least two subjets. The characteristic separation 
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FIG. 6: Representative -R-core distributions for R = 1.2 fat jets with pj, E (500,550) GeV and 
TTifiit E (60, 100) GeV. A dissection of the physics producing these shapes is discussed in the text. 

between the subjets is then Ai? su b ~ Ivtl^Ipt- In the case that the jet originates from 
a color singlet, one expects the additional radiation to be within this radius, while for a 
QCD-jet, which is color-connected to the beam, one expects the additional radiation to be 
outside this radius. To characterize this radiation pattern in an infrared safe way, we define 
-R-cores as follows. 

• Recluster the fat-jet with a smaller R < i?f at . 

• Take the highest pt subjet after reclustering, call its mass m(R) and its transverse 
momentum pt{R)- 

• The mass .R-cores are defined as c m (R) = m(-R)/m(-Rf at ). 

• The p T .R-cores are defined as c PT (R) = Pt(R) / Pr(Rf at) • 

For the application to boosted W's, we have i?f at = 1.2 and we consider i?-cores with 
R = 0.2, 0.3, . . . , 1.1. The mass and pr .R-cores tend to carry almost identical information, 
and in the end we use only px -R-cores for the final discriminant, since they work a little 
better. 
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FIG. 7: The average values of pr .R-cores for W-jets and QCD-jets, gauged by the left axis, and 
the ratio of the two curves, gauged by the right axis. 

Some distributions for mass and pt -R-cores are shown in Figure For large R > 0.5, 
we see that the W-jets have their pt -R-cores peaked much more sharply around 1 than the 
QCD-jet background. The longer tail of the QCD-jets is characteristic of radiation being 
more diffuse away from the center of the jet, as expected from the color-flow picture. As 
R is taken smaller, a larger fraction of events in the W-jet case have the two hard subjets 
separated by Ai? su b > R. In this case, the px of the hardest subjet measures the energy 
fraction of the splitting, similar to the z- variable used in [18( . Note that for this p 3 ^ ~ 500 
GeV sample, the characteristic subjet separation is A.R su b ~ 2mw/pT ~ 0.32. The two- 
peak shape emerging around R = 0.3 ~ Ai? su b is the result of splitting events in which two 
hardest energy deposits are within R or not. When they are within R, the pt of the subjet 
is close to the px of the fat jet. The -R-cores are useful in that they interpolate between 
a measure of the color-flow induced radiation pattern, at larger R, and the hard splitting 
scales, at smaller R. 

Another way to look at the -R-cores is through their average values. Figure [7] shows the 
average values of the pt -R-cores as a function of R for the W-jet and the QCD-jet samples. 
For the same R, the W-jet s tend to have a larger fraction of their p? in a single subjet. 
Also shown is the ratio of these mean values, which peaks around R = 0.3 ~ Ai? su b. This 
transition point is another way to estimate which R-core we expect to be most useful. 

To see the usefulness of -R-cores as discriminants, we show the maximal significance 
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FIG. 8: Maximal SIC as a function of R when c PT (R) is individually used, £ (500,550) GeV. 
The solid horizontal line indicates the SIC when a set of 10 px -R-cores (R=0.2 to 1.1) are combined 
using BDTs; the dashed vertical line indicates the estimation of Ai? su b as ~ 2mw Ipt- 
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FIG. 9: Gradual significance gain when adding c PT (R) one by one, in the order of Ri = 
0.2,0.3, ...,1.1, forp^ € (500,550) GeV. 

improvement characteristic as a function of R for the p? -R-cores in Figure EJ We see that 
the best single pt -R-core has R ~ 0.4. This is close to the characteristic subjet separation, 
Ai?sub ~ 0.32. However, when multiple -R-cores are combined (with Boosted Decision Trees, 
see the next section), the significance improvement can be much larger, as indicated by 
the horizontal line in the figure. Rather than a 15% improvement in significance, which is 
the best we can get from one variable, we find a 40% improvement when the variables are 
combined. The marginal improvement from adding 10 cores from R = 0.2 to R = 1.1 is 
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FIG. 10: Maximal SICs for the whole set of c PT (R) using BDTs as a function of pr- 
shown in Figure [9) 

It would be nice if a single variable could substitute for the combination of i?-cores. 
Clearly, any of the individual i?-cores will not do, as can been seen from Figure El The 
i?-cores are combining to measure the full radiation profile of the jet. Instead of looking 



at R- cores, one cou 
which is defined in 



d try to look at individual jet shapes. A reasonable candidate is girth 
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34| as g = ^3sr ■ Girth can be understood as px weighted average 
distance from the jet center, and is closely related to jet broadening. However, we find the 
gain from using girth is not comparable to that from the set of 10 i?-cores. 

Finally, we show in Figure [TD] the maximal significance improvement characteristic from 
the combined 10 pt -R-cores in different pt windows. The efficiency improves dramatically 
with higher p?. This is expected because the color-connected partons from W decay are 
more collimated at high pr, while the background color connections to the beam remain 
roughly the same. 



C. Sensitivity to grooming procedures 

As reviewed in Section [3j there are three recently developed general-purpose jet grooming 
procedures: filtering, trimming, pruning. Differing in details, these are all found to be 
efficient in removing soft QCD radiation from a fat initial jet. Because of the differences in 
details among various grooming procedures, the combination of them may give additional 
gain in significance compared to using one of them alone. This possibility was pointed out 
in 11], where a likelihood analysis was performed on the original jet mass distribution for 



jets passing mass window cuts for two different grooming methods. It was also shown in 
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that combining the mass from mildly and aggressively trimmed jets could improve upon the 
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FIG. 11: Distributions of grooming sensitivities, sensj^., sens^ m , and sens^ un for signal (Vl^-jets) 
and background (QCD-jets) for G (500, 550) GeV. All events satisfy m mt G (60, 100) GeV. 

significance from a single set of trimming parameters. 

Here we use another way to combine information from different grooming procedures 
based on the sensitivity to grooming. It is expected that for the same fat jet mass and 
Pt, radiation in QCD-jets has larger tendency to be groomed away than radiation around 
a W-jet. The ratio of the jet mass or px to its original value is therefore expected to be a 
good measure of this difference. We define dimensionless variables grooming sensitivities 



sens 



nit 



sens 



^trim 



trim 



sens 



m 

prun 



prun 



(3) 



m m m 

and similarly for p? grooming sensitivities. To be clear, the sample that we test these 
on have already passed the filtered mass window cut mmt £ (60, 100) GeV. To calculate 
these sensitivities, we use the original jets, before filtering, but which pass the filtered mass 
cuts. As expected, these ratios peak towards smaller value for QCD-jets than for IF-jets 
(Figure [TT]). 



D. Planar flow 



There have been attempts to discriminate jets from heavy particle decays against QCD- 
jets by using observables as functions of energy flow of the physical jet (3), [ijj]. One variable 
of such type that we found useful for our purpose is planar flow, P/, which characterizes 
the geometric distribution of energy deposition from a jet. Planar flow is defined as 
follows. For a given jet we first construct a matrix = yiifi— — where m ie t is 
the jet mass, is the energy of particle % in the jet, is the k th component of its 
transverse momentum relative to the jet's momentum axis. P/ is then defined based on I w 
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FIG. 12: Signal vs. background planar flow (Pf) distributions for pj, 6 (500,550) GeV: (a) Pf for 
the fat jet (R = 1.2); (b) Pf for the leading subjet reclustered with R = 0.4. 

as Pf = 4 fr*j T y2 = (\^l\l)2 where Ai^ are eigenvalues of I w . For linear distributions, Pf — > 0, 
while for isotropic distributions, Pf — > 1. 

Planar flow has been suggested for top-tagging, since a boosted top jet should be more 
isotropic due to three hard prongs coming from its on-shell decay. In contrast, a QCD-jet 
is more linear as it typically has two leading hard prongs. Resonances decaying to two 



partons are more similar to QCD-jets in terms of Pf, but as pointed out in [15| with Higgs 
as an example: although both have two prongs and Pf peaks towards 1, the prongs from 
the heavy particle decay are sharper and Pf peaks at lower values than QCD. The planar 
flow distributions for PF-jets and their QCD-jet background are shown in Figure [121 We see 
that planar flow promises to still be a useful discriminant. Planar flow becomes even more 
useful at higher px- 

We find it useful to consider not just the planar flow of the original fat jet, P/, but also the 
planar flow of the the highest px subjet resulting from reclustering with R = 0.4, P/(0.4). 
R = 0.4 is more useful for high px samples, while R = 1.2 is more useful for low px samples, 
which is related to the pr-dependence of proper jet cone sizes. 



E. Features of Sub jets 

After reclustering with smaller R during filtering, we get a set of subjets from the original 
fat jet. Variables related to these subjets can further distinguish substructure of PF-jets from 
that of QCD-jets. It is known that the two subjets from the decay of a massive particle 
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FIG. 13: Signal and background distributions of p^ b2 /pT, ^Rsub and n su b for Pj, £ (500, 550) GeV 
samples in the filtered mass window. 



are more symmetric in px than those from QCD. In fact, the y cut parameter in the filtering 
algorithm is based on this consideration. We call the subjet with the highest pt subjet 1 
and the one with the second highest pt subjet 2. 

Two variables that we find useful are the ratios of the px's of the two leading subjets to 
the original jet px'- Px /Pt and p™ b2 /pr- These variables are more useful than p™ bl /p™ b2 
alone. Another useful variable is the geometric distance in the r)-(j) plane between the two 
leading subjets AR SU ^. For signal jets it peaks around smaller values than QCD-jets. Fi- 
nally, the total number of subjets {px > 10 GeV) after the filtering process, n su b, can help. 
n su b concentrates around smaller values for W-jets than for QCD-jets. This is because com- 
pared with VT-jets, QCD radiation is more diffusely distributed. For illustration plots, see 
Figure { 



5. MULTIVARIATE ANALYSIS FOR OPTIMAL IT-JET TAGGING 

So far we have seen how certain variables may help improve significance when individually 
used. A proper combination of different variables could optimize the discrimination power 
as it incorporates more details of radiation pattern. As before, we consider the SM WW 
(semi-leptonic) and Wj (leptonic W decays) data samples which have been processed with 
filtering and then passed a mmt G (60, 100) GeV mass window cut. After the mass window 
cut, the original unfiltered fat jets are used for subsequent analysis. 

Simple rectangular cuts cannot make optimal use of multiple variables since they over- 
look the multidimensional correlations. Instead we use more sophisticated multivariate 
techniques, as implemented in TMVA (Toolkit for Multivariate Data Analysis with Root) 
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351 ]. to maximize the efficiencies. In particular, we use the Boosted Decision Trees (BDT) 



method which appears fast and reliable, and particularly well suited for high energy theory 
analyses. Details of this method as used in particle physics can be found, for example, 



m As we will see, using our variables and BDTs is significantly better than filtering 
alone, with an additional factor of 2 — 3 improvement in SIC. One can then apply the cuts 
giving the maximal SIC to data samples from different processes (we will show two examples 
later: Z' discovery and Wj as signal vs. jj). Such applications also test the robustness of 
multivariate methods. 

For various jet p^'s we begin with ~ 10 5 signal events and ~ 10 6 background events 
after the filtered mass cut as input samples. We first rank the individual variables based 
on the SIC when they are individually used. Then among those at the top we try to find 
a combination of variables for which the improvement in S/yB almost saturates (adding 
even more variables on top has little effect). Some variables, like the pull angles, girth, or 
mass i?-cores tend not to help on top of other top variables, so they are not used for the 
final list. A nice feature of the BDT method is adding useless variables does not particularly 
downgrade the training speed or final efficiencies. A set of 25 variables (all these variables 
have been defined in Section H]) that saturate the efficiencies is 

subl,sub2 subl sub2 

m jet , c PT (0.2-0.11), sens^ imjprun , P f , P f (0A), *Z- , m ' , Ai? sub , n snh . (4) 

Pt m 

We use 10 pt -R-cores, from R = 0.2 to R = 1.1 by 0.1 and 6 grooming sensitivities. 

Figure [141 shows the SIC curves {es/ \fe~B functions of Es) for these variables, as each one 
(or set) are added. The curves are cumulative. The big jumps in the lower curves come from 
adding 10 i?-cores and then the two filtering sensitivities as groups. Naturally, the discrimi- 
nation efficiency of the variables is pt dependent, so plots for p T £ (200, 250), (500, 550) and 
(1000, 1050) GeV are shown separately. Figure [TBI shows the maximal SIC using these 25 
variables as a function of p?. We see the improvement gets more appreciable towards higher 
Pt- 

In practice if one prefers to use fewer variables and be less ambitious about significance 
gain, one can do almost as well with a subset of these variables. For example, if we take the 
7 variables 

m(0.5), m(0.4), m m , m subl , m sub2 , ^ P f (0A), (5) 

Pt 
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FIG. 15: The maximal SIC with MVA using all 25 principle variables as a function of jet pr, and 
the corresponding signal, background efficiencies. The background efficiencies are multiplied by 
10. 

we can achieve a significance gain of ~ 1.9 over the filtered sample, as compared to ~ 2.4 
using the full 25 variables. This particular subset of variables is partially motivated by 
having smaller sensitivity to the underlying event, as will be discussed in Section [7] below. 



6. W-POLARIZATION DEPENDENCE 



As is well known, the distribution of W decay products depends on the polarization of 
the W. This has an effect on the W-jet substructure and can therefore be exploited both 
to improve efficiency if the polarization of the sample is known, or even to measure the 
^-polarization if the statistics are high enough. Similar ideas were used for top-tagging 



m 



2l|. 



Let us define 9 as the angle between an up-type Fermion (including u and c quarks and 
neutrinos) and the W + moving direction in the rest frame of W + . Then the probability 
density of finding the Fermion is given by 



P(cos9) = { 8V ' " (6) 

f(l - cos 2 9) for h w+ = 0. 

where h w + is the helicity of the W + boson. For a down-type anti-fermion, (1 =)= cos 6) flips 
to (1 ±cos 9) in the first line of Eq. ()6]). The formula holds for W~ too if we replace up-type 
with down-type. 
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FIG. 16: Ratio of the pt of the lower px parton to pt of the higher px parton from a W decay, for 
different W polarizations. 

These distributions imply that for transverse Ws the probability density is maximum at 
cos 9 ~ ±1, which means one of the decay products tends to go along the W momentum and 
the other one against it. When the W is boosted, this results in an unbalanced configuration 
for the two decay products' momenta, namely, one smaller than the other one. On the other 
hand, for longitudinal Ws, the probability density is maximum at cos# ~ 0, where the decay 
products' momenta are perpendicular to the W momentum in the W rest frame, and more 
balanced when boosted. Since a QCD splitting tends to produce unbalanced momentum 
configuration, transverse Ws behave more like QCD-jets than longitudinal Ws, and we 
expect better identification for longitudinal ones. For the SM VF-pair production, the Ws 
are dominantly transverse: about 92% for p^ > 200 GeV. Therefore, the results reported 
in the previous sections can be viewed to good approximation as for transverse Ws. There 
are also cases where the Ws are dominantly longitudinal, for example, Ws from a heavy 
SM-like Higgs decay or high energy WW scattering. 

To study the longitudinal case, we start by generating WW pairs using Madgraph but 
this time we decay the Ws manually according to P(cos6) oc (1 — cos 2 9). Note that in this 
way the spin correlation between the two Ws in the same event is not included, but it does 
not affect our results since the leptonic W is excluded from jet clustering. In Figure HH we 
display the px ratio between the two partons from a W decay for p^ G (500, 550) GeV. As 
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jet p T (GeV) jet p t (GeV) 

(a) Optimized filtering parameters (b) Optimized efficiencies and SICs. 

FIG. 17: Tuning of filtering parameters for longitudinally polarized VF-jets versus QCD jets. For 
comparison, the results for transverse Ws from Figure [Hare reproduced here. 

expected, the momenta are more balanced for longitudinal Ws than transverse ones. The 
events are then processed with Pythia 8 and we repeat the procedure described in Section 
[3] through [5j 

The filtering parameters which maximize the SIC for the longitudinal sample are shown 
in Figure [171 The fact that the two subjets are more balanced allows us to use tighter cuts 
to cut more background events for the same signal efficiency, resulting in higher SIC than 
the transverse case. The multivariate analysis provides further a larger significance gain 
than for the transverse be seen in Figure [IHJ All together, after filtering and 

our MVA W-jet tagging, the maximal SIC is ~ 7.0 for longitudinal Ws, significantly larger 
than that of transverse Ws, ~ 5.3. 

The polarization effect of the W boson poses a question: what parameters/cuts should 
we use when looking for boosted Ws? This depends on our goal: if we are looking for W 
bosons inclusively, we should be conservative and use relatively loose cuts obtained from 
transverse Ws; if we are interested in a particular process dominated by longitudinal Ws, 
we should use tighter cuts optimized for longitudinal ones. 

7. DIFFERENCES IN MONTE CARLO TOOLS 

In our analysis, we have extensively utilized the differences in radiation patterns between 
IWjets and QCD-jets. These patterns have not been measured at high and we have been 
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FIG. 18: The SIC using BDTs as a function of signal efficiency for transverse and longitudinally 
polarized W's. This is for p 3 ^" S (500, 550) GeV and these gains are on top of the factors of ~ 2 or 
~ 2.5 for the two samples from filtering, as shown in Figure [17] (b). 

relying on Pythia 8 simulations. It is important to cross-check using different Monte Carlo 

tools, which is the subject of this section. It is also possible to compare the same event 

generator, with different tunes. Up to now, all results have been obtained with the default 

tune of Pythia 8.142. We tried also the tune "3C", which is a tune to the Tevatron and early 

LHC data for initial state radiation, multiple interaction and beam remnants. There were 

no discernible differences between these tunes for our variables. So we restrict the discussion 

in this section to a comparison of Pythia 8 and Herwig++. We perform the comparison by 

testing the cuts/parameters/BDTs trained on Pythia 8 event samples on samples generated 

r I 

with Herwig++ V2.4.2 [32|. 

As before we look at WW and PF+jet in the SM. With each Monte Carlo, we use the 
same jet algorithm (Cambridge/ Aachen with R = 1.2) to find the high pt jets. We consider 
only jets with p? £ (500,550) GeV. We apply the filtering/pruning/trimming procedure 
using the parameters given in Table [3] in Appendix [A] As before, only events passing the 
filtered mass window cut, m^\ t G (60, 100) GeV, are retained. For Herwig++ data samples, 
the efficiencies after the mass window cut for the signal and background jets are respectively 
64.4% and 8.68%, yielding a significance gain of 2.18. The corresponding efficiencies for 
Pythia 8 are 65.8% and 8.88%, yielding a very similar significance gain of 2.21. So, as far 
as the filtering/mass-drop step is concerned, there is hardly any difference. 

We then obtain the values of the variables defined in Section [5] and evaluate the BDT 
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FIG. 19: Signficance improvements resulting from a boosted decision tree trained on Pythia 8, and 
tested on Pythia 8 or Herwig++, for p^* e (500, 550) GeV. 



response using weight files trained on Pythia 8 event samples. In Figure [T9l (a), we show 
the significance gain as a function of the signal efficiency, for jets with pt G (500, 550) GeV. 
From Figure fT9l (a), we see that the Pythia 8 results differ significantly from Herwig++. The 
most likely origin of the difference is in the modeling of the underlying event (UE), which 
can have an important effect on jet substructure. To test this, we show in Figure [191 (b) the 
result with UE turned off for both Pythia 8 and Herwig++ 2 . For this figure, we retrained 
the BDT from the Pythia 8 sample and then tested it on both Pythia 8 and Herwig++. The 
BDT responses without the underlying event are much less sensitive to the Monte Carlo. 

We can understand better the difference between the Monte Carlos by examining the con- 
tributions to our variables from the underlying event. Let us start with jet masses. We have 
found that Herwig++ in general produces more radiation through the underlying event than 
Pythia 8, which can be seen from Figure[20l In Figure [201 (a), we show the VF-jet mass distri- 
butions in the signal sample after filtering. For p 3 ^ 1 G (500, 550) GeV, the distance between 
the two subjets is only about 0.3 ~ 0.4. Therefore, the filtered mass receives small contribu- 
tions from initial state radiation and the underlying event, and as expected, the two Monte 
Carlos give almost identical distributions. On the other hand, we see from Figure 1201(b) that 
the original jet mass (R = 1.2) from Herwig++ is larger than from Pythia 8. By using R = 

2 What are turned off are multiple interactions by using the switch "PartonLevehMI = off" for Pythia 8 
and "set /Herwig/Shower/ShowerHandler:MPIHandler NULL" for Herwig++. 
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FIG. 20: Simulation dependence of jet masses, for PF-jets only. 




1.2 for jet clustering, we include ISR and UE contributions in a large region, which makes 
the difference of the two Monte Carlos manifest. For comparison, the jet mass without UE is 
given in Figure [201(c). showing opposite behavior in the mass tail, namely, the Herwig++ jet 
mass is lightly smaller. This clearly shows that Herwig++ produces more radiation through 
UE. Consequently, for Herwig++, H^-jets look more like QCD-jets (compare Figure[5]), which 
explains the smaller significance improvement using Herwig++. Similar behavior can be seen 
in other variables. For example, in Figure ETJ we compare the planar flow for two different 
.R's, R = 0.4 and R = 1.2 for signal jets with px £ (500, 550) GeV. We see very small differ- 
ences between Pythia 8 and Herwig++ for R = 0.4 but significant differences for R = 1.2. 
Again, for the R = 1.2 case more UE is included which explains the dramatic difference. 
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Another way to understand the effect is through the grooming sensitivities. In Figure [221 
we draw separately the trimming sensitivity, sens™ im for W-jets and QCD-jets. We also 
draw distributions with UE turned off, and distributions with both UE and ISR turned off. 
In the latter case, the only contribution to the radiation is through final state radiation; 
we see that sens™ im is much more concentrated around 1 for VF-jets than QCD-jets, which 
means much less radiation is trimmed away for VF-jets. After adding the ISR, the difference 
is still dramatic. When all contributions are included, the difference between VF-jets 
and QCD-jets becomes smaller. This explains why one can obtain better discrimination 
power by turning off UE, as shown in Figure [19j Moreover, Figure [22] clearly shows that 
more radiation is trimmed away for Herwig++ than for Pythia 8, in both the signal and 
background distributions. The difference is more significant in the signal distributions and 
again, the Herwig++ result is more similar to the background. 

We have seen that the variables which have the larger difference involve larger, or unfil- 
tered jets, and are therefore more sensitive to the UE. This motivates us to consider only 
variables defined within a small region around the candidate VF-jet direction. Such a set of 
variables was listed in Eq. ([5]) . In Figure [23J we show the significance improvement using 
this set. The differences between the two Monte Carlos is clearly smaller than in Figure [T9| 
but still visible. 
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FIG. 23: SIC curves obtained using a smaller set of variables meant to reduce dependence on 
modeling of the underlying event. The same BDTs trained on Pythia 8 are tested on Pythia 8 and 
Herwig++. For comparison, Figure HH (a) is reproduced here in the second panel. 

8. APPLICATIONS 

In this section, we apply the method presented in the previous sections to other processes 
involving boosted PF-bosons. We demonstrate the robustness of our method as a general 
purpose VF-tagger, and show the improvements compared to more conventional methods. 



A. Z' W+W~ ^l ± +j + ^ T 

A well-motivated application of our W-jet tagging method is the search for new vector 
resonance Z' via pp — > Z' — > W + W~ — > Ivqq. In addition to the general possibility that a 
new Z' can have a significant coupling to W + W~ , this channel is particularly important in 
models where electroweak symmetry breaking is related to strong dynamics. In technicolor 
or 5D Higgsless 37| models, exchanging a tower of Z' resonances is essential for restoring 
unitarity for high energy WjJW^ scattering as a substitute of a light Higgs. For a Z 1 with 
couplings similar to those of the Z, direct searches and electroweak precision constraints 
have pushed its allowed mass to be above ~ 1 TeV |38(. W bosons produced from such 
heavy Z' are expected to be highly boosted, therefore provide a natural arena to test our 
method. 

In more conventional methods, the hadronic W from a Z' decay is either treated as 
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two separate jets or one fat jet. For example, the authors of Ref. [39j demand two jets 
reconstructing the W mass and separated by ARjj > 0.4. This method eliminates a large 
fraction of the signal when Mz> >1 TeV due to the merging of the W decay products to one 



jet. In the study of TeV scale Kaluza-Klein Z' in Randall- Sundrum (RS) models in Ref. 40], 
the authors use a simple jet mass cut around M\y with jet size R = 0.4. We will see that the 
latter gives us similar results as filtering, while using our VT-jet tagging method, we obtain 
significantly better results in both S/\/~B and S/B. 

For concreteness we consider a Z' which couples to the SM fermions and gauge bosons 
with the same Lorentz structure as the SM Z boson, yet with rescaled strength. We choose 



the couplings g Z 'ff = Q-2gzff> dz'ww = r 9zww, as in typical RS models [39]. We 
consider Z 1 with a mass Mz> = 1.5 TeV and a width ~ 125 GeV. We consider the 14 
TeV run of the LHC, where the effective cross section for Z 1 — > W + W~ in the semileptonic 
channel is 26.4 fb. Note that for such a high mass Z', 97.5% events have a AR < 0.4 for 
the two quarks from the W decay (parton level), making it very difficult to identify two 
separate jets. Therefore, we focus on the methods when the W's are identified as single jets. 
The signal events therefore contain ^ + lj + The major SM backgrounds are W + lj, 
WW and ti. All signal and background events are generated with Madgraph 4 at parton 
level. As before, the events are processed with Pythia 8 and jets are found with the C/A 
algorithm using R = 1.2. The following kinematic cuts are then applied: 

|77,| < 2.5, \rijl < 3, p l T > 100 GeV, p 3 T > 500 GeV, $ T > 100 GeV, (7) 

where the pt cuts apply on the leading jet and lepton, which are assumed to be the W-jet 
and the lepton from the leptonic W decay. 

To efficiently reduce QCD backgrounds, especially the ti background we veto additional 
central jets with 

\r}j\<3 and p 3 T > 100 GeV. (8) 

We then apply our W-jet tagging procedure on the leading jet in events passing the above 
cuts to identify the hadronic W's. In particular, we use the same parameters and BDT 
weight files obtained before from training the SM WW/Wj samples. 

The naive way of applying the BDT weight files is to impose the optimal BDT cuts for 
maximizing the SIC of VF-jets vs QCD-jets, since VT+jet is the dominant background in 
our Z' search. However, our method is so efficient for reducing the QCD-jets such that 
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TABLE 1: Number of events, S/yB and S/B at 2 fb _1 for signals with M' z = 1.5 TeV and major 
SM backgrounds. A (1300, 1700) GeV mass window cut is imposed on the reconstructed Z' mass. 
Numbers in parenthesis are for the case when only Wj is taken as the background. If 





signal 


Wj 


it 


WW 


s/Vb 


S/B 


Kinematic cuts 


23 


148 


12 


2.1 


1.8(1.9) 


0.14 (0.15) 


Filter 


18 


10 


1.4 


1.2 


5.0 (5.6) 


1.4 (1.7) 


MVA 


11 


0.91 


0.35 


0.68 


7.6(11) 


5.5(11) 


R = 0.4 mass cut 


22 


22 


2.4 


1.4 


4.3(4.6) 


0.85(1.0) 



Q Note that for small numbers of events, Poisson statistics should be used to extract the exact significance. 
Assuming an integer number of events closest to the expectation value of S + ^2 B are observed, we have 
the significances: 2.0, 4.3, 5.3 and 3.9. 



after doing so, the VK+jet background is comparable to the WW and ti backgrounds which 
contain W-jets as well. Therefore, the optimal BDT cuts when all backgrounds are included 
are different from before. In order to obtain the best significance for Z' search, we use the 



same BDT weight files while scan the BDT cuts for each bin to maximize S/ v/^Z B 
where the sum is over the Wj, WW, ti SM backgrounds weighted by their cross-sections. 
The result presented below is then the optimal one from such scan. 

The presence of only one neutrino in the final state allows the reconstruction of its momen- 
tum by requiring transverse momentum conservation and applying the W mass constraint. 
In doing so, we obtain two solutions of the neutrino p z , which, combined with the hadronic 
W momentum, give rise to two reconstructed WW masses. We take the minimum of the two 
reconstructed masses M^y/rec- The resulting Mffiwrec distributions are shown in Figure l24"l 
where an integrated luminosity of 2 fb _1 is assumed. We then apply a cut on the Z' mass, 
^ww Tec e (1300, 1700) GeV. The number of events within this window at various steps are 



given in Table [U together with S/yB and S/B. For comparison, we have also included 
the results using conventional jet mass method, obtained by reclustering the events with 
R = 0.4 and apply the kinematic cuts as well as a jet mass cut (60, 100) GeV for candidate 
W-jets. 

From Table [IJ we see the traditional jet mass method gives similar S/ \fB as filtering, 
while using our PF-jet tagging method, we obtain significantly better results in both S/ \[B 
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FIG. 24: Invariant mass distributions for signal (Z' — > W + W — > + j + -^t) backgrounds. 
The upper left pane is for conventional jet mass method (R = 0.4). 

and S/B. Note that the signal efficiency after filtering is larger than those given in Tabled 
because the W's from Z' decays are dominantly longitudinal. 



B. Dijet versus VF+jet 

Our last test and application of the method is to consider the possibility of identifying 
boosted I^-bosons in dijet events at the early LHC We consider the 7 TeV run with 1 
ftT 1 integrated luminosity 3 . We will not include systematic uncertainties such as from QCD 
dijet cross-section calculation, since the main purpose here is to test the robustness of our 
method. In this process, there is no way to distinguish hadronic W and Z bosons except for 
the mass difference. If one would like to identify both W's and Z's, it is better to rerun the 



A similar study using the filtering method alone has been performed in 43 1. 
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optimization procedure including both W'b and Z's. For example, we should probably use 
a wider filtered mass window and also include both W's and Z's when training the BDT. 
As a direct test of our method, we apply exactly the same cuts/weight files obtained above 
and treat Z+jet background. 

We consider jets with p T > 400 GeV. The jet mass distributions for VF+jet, QCD dijet and 
Z+jet event samples (generated with Pythia 8) are shown in Figure [251 The corresponding 
numbers of jets, S/\/B~ and S/B are shown in Table [2j Note that in the W+jet sample, only 
half of the high px jets come from a W decay. If only the W s are counted as signal, S/ y/~B 
and S/B in the first row of Table |2] should be cut in half to 1.1 and 0.0016 respectively. 
Then we see filtering increases the significance by a factor of ~ 2, which is increased further 
by a factor of 2.2 after MVA. This is in line with the results given in Section [5j although 
the processes and center of mass energy are different. 





W+jet 


QCD dijet 


Z+jet 


S/y/B 


S/B 


p T > 400 GeV 


1570 


490k 


753 


2.2 


0.0032 


filtering 


594 


67k 


250 


2.3 


0.0088 


MVA 


153 


906 


34 


5.1 


0.17 



TABLE 2: Number of jets in different dijet samples for 7 TeV LHC with 1 fb integrated lumi- 
nosity. 

9. CONCLUSION AND DISCUSSIONS 

In this article, we have investigated the differences between QCD-jets and highly boosted 
hadronically decaying color singlet particles. We have shown that excellent distinguishing 
power can be achieved by utilizing a multivariate method: for jets with px > 200 GeV, we 
obtain a factor of ~ 5 improvement in the statistical significance. We have considered W 
bosons as an example, and the same method can be used on highly boosted Z bosons or 
Higgs bosons as well. 

There are two major differences between a W-jet and a QCD-jet. First, the two subjets 
initiated by the two quarks from a W decay tend to carry momenta of similar size with 
their angular distance determined by the W mass and momentum. If the W boson is not 
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FIG. 25: Application of W-jet tagging to hadronic-VF+jet search. Top: jets with px > 400 GeV; 
middle: after fUtering+mass-drop; bottom: after multivariate analysis. The W+jet and Z+jet 
contributions are multiplied by 10 in the top two panels to make them visible. 



too boosted (pr < 1200 GeV), two clean subjets can be identified using usual jet algorithms 
but with smaller radius. On the other hand, due to collinear and soft divergences, a QCD 
splitting tends to produce either two partons too close to be identified as two separate 
subjets, or two separate partons with hierarchical momenta. Therefore, we can distinguish 
a W-jet from a QCD-jet by requiring two subjets with balanced momenta. This is the idea 
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behind the jet grooming algorithms proposed for identifying boosted decaying particles. 
However, as we mentioned in the introduction, jet grooming alone cannot give us the 
optimal discriminating power because information regarding radiation patterns is discarded. 

Indeed, the second difference between H^-jets and QCD-jets lies in the different patterns of 
final state radiation, which have not been explored sufficiently in the literature. For example, 
the radiation of a boosted color singlet particle such as a W is mostly concentrated within 
a small region around its momentum. In this article, we have identified a set of efficient jet 
substructure variables and combined them in a multivariate analysis. We have found much 
better discriminating power than using jet grooming alone: a factor of 2 ~ 3 improvement 
in the statistical significance is achieved on top of the filtering results. 

We have used the SM WW — > Ivqq and Wj — > Ivj processes to optimize the discrimi- 
nation power. It turns out that the variables we use characterize generic properties of high 
Pt jets, independent of the specific process. We have illustrated this by considering two 
interesting applications. The first one is a Z' search at the LHC with center of mass energy 
of 14 TeV, with the Z' decaying to a W pair and the W's decaying semileptonically. The 
second one is searching for hadronic-H^+jet events in dijet events at the 7 TeV LHC. In both 
processes, we have identified the boosted W's using the same multivariate W-jet tagging al- 
gorithm trained to distinguish the SM WW events from the SM Wj events. We have found 
significant improvement over existing methods, consistent with the SM WW /Wj results. 

We have obtained our results using Pythia 8 simulations. As another test, we have applied 
exactly the same cuts obtained from Pythia 8 on data samples simulated with Herwig++. 
We have found a 25% difference in the maximal significance, with Herwig++ giving the 
smaller value. As we have verified, most of the difference comes from the different treatment 
of the underlying event in the two Monte Carlo tools, which should be resolved once both 
Monte Carlos are tuned to the LHC measurements. We have also shown by using a subset 
of the variables that are less sensitive to the underlying event, we obtain more robust results 
which are almost as good as using the whole set. 

Finally, we point out that the code for W-jet tagging is publicly available at 



http:/ /jets. physics. harvard.edu/wtag This code contains the trained boosted decisions trees 



and can be used immediately in applications. Users can also conveniently use the provided 
routines to examine the jet substructure variables and/or train their own event samples. 
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Appendix A: Filtering/trimming/pruning 

All of the three jet grooming algorithms start from a jet found with some recombination 
algorithm such as k t , anti-A^ and Cambridge/ Aachen (C/A) algorithms. It turns out filtering 
with mass drop gives us slightly better significance than pruning and trimming. For filtering, 
the C/A jet algorithm works the best, so we will fix the jet algorithm to C/A, except for 
trimming (see below). Starting from a jet with relatively large size R, the jet grooming 
algorithms act on the fat jet as follows 

1. Filtering with mass drop [8|: For a given jet found with recombination parameter R, 
we first look for a significant "mass drop" by the following procedure: 

(a) Undo the last step of jet clustering for jet j. The two resulting subjets ji, j 2 are 
ordered such that rrij 1 > rrij 2 . 

(b) Stop the algorithm if a significant mass drop is found and the splitting is not too 
asymmetric, i.e., if the following conditions are met: 

m h < /Jirrij and y = Pt ^Ptj2 Ai?j i ia > y cut) (Al) 

m j 

where /i and y cnt are free parameters. 

(c) Otherwise redefine subjet ji as j and repeat. 

When a mass drop is found, we use Rm = min(0.3, i? Jli j 2 /2) to recluster particles 
contained in ji and jV The three hardest subjets are retained and combined as the 
new "filtered" jet. It is possible to do the reclustering procedure without the mass drop 
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algorithm. Nevertheless, in our analysis mass drop is always included, and implicitly 
assumed whenever we refer to filtering. 

2. Pruning [4|: For a given jet, we recluster it with C/A algorithm, but when trying to 
merge subjets i,j — > p, the following condition is checked: 

z = mm(P " ,PTj) < z cut and ARij > D cuU (A2) 

PTp 

where z cut and D cut are free parameters. If the condition is met, do not merge the 
two subjets and the one with smaller px is discarded. Continue until all particles are 
clustered or discarded. In the code provided in Ref. J^sjl, D cnt is determined from 
another parameter, i?^ tor , by D cut = 2R i < ^ jOI m p /pxp- 



3. Trimming 



271 ]: For a given jet, we recluster it using k t algorithm with radius R sn b to 



identify the subjets. We then discard subjets i with 

PT,i < /cutPTjet, (A3) 

where p-rjet is the pt of the original jet. We see the difference between filtering and 
trimming is that we keep fixed number of subjets in filtering, while in trimming whether 
we keep a sub jet is determined by the sub jet's px- 

All three grooming algorithms involve two parameters in addition to the initial jet radius 
R. In our analysis, we fix R = 1.2 and scan the other parameters to maximize es/ ' \[^b in the 
mass window (60, 100) GeV. As examples, the significance gain for pruning and trimming 
are shown in Figure [261 for jet px £ (500, 550) GeV. The optimal parameters for all px bins 
we consider are given in Table [3j 
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